AITopics

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(7 more...)

Genre: Research Report (0.46)

Industry:

Education (0.46)
Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Neural Information Processing SystemsFeb-11-2026, 07:43:42 GMT

d37124c4c79f357cb02c655671a432fa-Paper.pdf

international conference, learning representation, machine learning, (9 more...)

Country:

North America > United States > California (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Industry:

Health & Medicine (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

arXiv.org Artificial IntelligenceNov-3-2025

Multimodal LLM-assisted Evolutionary Search for Programmatic Control Policies

Hu, Qinglong, Tong, Xialiang, Yuan, Mingxuan, Liu, Fei, Lu, Zhichao, Zhang, Qingfu

Deep reinforcement learning has achieved impressive success in control tasks. However, its policies, represented as opaque neural networks, are often difficult for humans to understand, verify, and debug, which undermines trust and hinders real-world deployment. This work addresses this challenge by introducing a novel approach for programmatic control policy discovery, called Multimodal Large Language Model-assisted Evolutionary Search (MLES). MLES utilizes multimodal large language models as programmatic policy generators, combining them with evolutionary search to automate policy generation. It integrates visual feedback-driven behavior analysis within the policy generation process to identify failure patterns and guide targeted improvements, thereby enhancing policy discovery efficiency and producing adaptable, human-aligned policies. Experimental results demonstrate that MLES achieves performance comparable to Proximal Policy Optimization (PPO) across two standard control tasks while providing transparent control logic and traceable design processes. This approach also overcomes the limitations of predefined domain-specific languages, facilitates knowledge transfer and reuse, and is scalable across various tasks, showing promise as a new paradigm for developing transparent and verifiable control policies.

large language model, machine learning, mle, (16 more...)

2508.05433

Genre: Research Report > New Finding (1.00)

Industry: Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
(2 more...)

Neural Information Processing SystemsOct-2-2025, 19:23:14 GMT

Imitation-Projected Programmatic Reinforcement Learning

Abhinav Verma, Hoang Le, Yisong Yue, Swarat Chaudhuri

However, such a distillation process can yield a highly suboptimal programmatic policy -- i.e., a large

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Country:

North America > United States (0.93)
Europe (0.93)

Genre: Research Report (0.46)

Industry:

Education (0.46)
Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Neural Information Processing SystemsAug-17-2025, 13:39:21 GMT

Learning to Synthesize Programs as Interpretable and Generalizable Policies Dweep Trivedi Jesse Zhang 1 Shao-Hua Sun 1 Joseph J. Lim 1 1 University of Southern California

Y et, these works either employ limited policy representations (e.g.

machine learning, natural language, reinforcement learning, (12 more...)

Country:

North America > United States > California (0.86)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Industry:

Health & Medicine (1.00)
Education > Educational Setting > Higher Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Rajabpour, Amirhossein, Aghakasiri, Kiarash, Zilles, Sandra, Lelis, Levi H. S.

Common Benchmarks Undervalue the Generalization Power of Programmatic Policies

arXiv.org Artificial IntelligenceJun-18-2025

Algorithms for learning programmatic representations for sequential decision-making problems are often evaluated on out-of-distribution (OOD) problems, with the common conclusion that programmatic policies generalize better than neural policies on OOD problems. In this position paper, we argue that commonly used benchmarks undervalue the generalization capabilities of programmatic representations. We analyze the experiments of four papers from the literature and show that neural policies, which were shown not to generalize, can generalize as effectively as programmatic policies on OOD problems. This is achieved with simple changes in the neural policies training pipeline. Namely, we show that simpler neural architectures with the same type of sparse observation used with programmatic policies can help attain OOD generalization. Another modification we have shown to be effective is the use of reward functions that allow for safer policies (e.g., agents that drive slowly can generalize better). Also, we argue for creating benchmark problems highlighting concepts needed for OOD generalization that may challenge neural policies but align with programmatic representations, such as tasks requiring algorithmic constructs like stacks.

agent, artificial intelligence, machine learning, (19 more...)

2506.14162

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Moraes, Rubens O., Sadmine, Quazi Asif, Baier, Hendrik, Lelis, Levi H. S.

InnateCoder: Learning Programmatic Options with Foundation Models

arXiv.org Artificial IntelligenceMay-20-2025

Outside of transfer learning settings, reinforcement learning agents start their learning process from a clean slate. As a result, such agents have to go through a slow process to learn even the most obvious skills required to solve a problem. In this paper, we present InnateCoder, a system that leverages human knowledge encoded in foundation models to provide programmatic policies that encode "innate skills" in the form of temporally extended actions, or options. In contrast to existing approaches to learning options, InnateCoder learns them from the general human knowledge encoded in foundation models in a zero-shot setting, and not from the knowledge the agent gains by interacting with the environment. Then, InnateCoder searches for a programmatic policy by combining the programs encoding these options into larger and more complex programs. We hypothesized that InnateCoder's way of learning and using options could improve the sampling efficiency of current methods for learning programmatic policies. Empirical results in MicroRTS and Karel the Robot support our hypothesis, since they show that InnateCoder is more sample efficient than versions of the system that do not use options or learn them from experience.

large language model, machine learning, reinforcement learning, (20 more...)

2505.12508

Country:

Europe (0.46)
North America > Canada (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Leisure & Entertainment > Games (0.68)
Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Neural Information Processing SystemsJan-23-2025, 23:48:15 GMT

Reviews: Imitation-Projected Programmatic Reinforcement Learning

This paper addresses the problem of learning programmatic policies, which are structured in programmatic classes such as programming languages or regression trees. To this end, the paper proposes a "lift-and-project" framework (IPPG) that alternatively (1) optimizes a policy parameterized by a neural network in an unconstrained policy space and (2) projects the learned knowledge to space where the desired policy is constrained with a programmatic representation. Specifically, (1) is achieved by using deep policy gradient methods (e.g. DDPG, TRPO, etc.) and (2) is obtained by synthesizing programs to describe behaviors (program synthesis via imitation learning). The experiments on TORCS (a simulated car racing environment) show that the learned programmatic policies outperform the methods that imitate or distill a pre-trained neural policy and DDPG.

ippg, neural policy, programmatic policy, (10 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

Shabadi, Guruprerana, Fijalkow, Nathanaël, Matricon, Théo

Programmatic Reinforcement Learning: Navigating Gridworlds

arXiv.org Artificial IntelligenceJan-10-2025

The field of reinforcement learning (RL) is concerned with algorithms for learning optimal policies in unknown stochastic environments. Programmatic RL studies representations of policies as programs, meaning involving higher order constructs such as control loops. Despite attracting a lot of attention at the intersection of the machine learning and formal methods communities, very little is known on the theoretical front about programmatic RL: what are good classes of programmatic policies? How large are optimal programmatic policies? How can we learn them? The goal of this paper is to give first answers to these questions, initiating a theoretical study of programmatic RL. Considering a class of gridworld environments, we define a class of programmatic policies. Our main contributions are to place upper bounds on the size of optimal programmatic policies, and to construct an algorithm for synthesizing them. These theoretical findings are complemented by a prototype implementation of the algorithm.

artificial intelligence, machine learning, reinforcement learning, (16 more...)